00. Create/Clone a github repository

01. Create RStudio Project

02. Setup RStudio

03. Download .CSV Files

The three datasets that will be used are:

The original and reformatted CSV’s can be found in 01 Data folder.

04. ETL & Create Table on Database

CREATE TABLE Refugee_Stats ( Asylum_Country varchar2(4000), Origin_Country varchar2(4000), Record_Year number(38,4), Refugees number(38,4), Asylum_seekers number(38,4), Returned_Refugees number(38,4), IDPs number(38,4), Returned_IDPs number(38,4), Stateless_persons number(38,4), Others_of_concern number(38,4), Total_Population number(38,4) );

TABLE “MILITARIZED_DISPUTES”

STATE_ABBR VARCHAR2(4000 BYTE) Yes 1
DISPNUM3 NUMBER(38,4) Yes 2
DISPNUM4 NUMBER(38,4) Yes 3
COUNTRY_CODE NUMBER(38,4) Yes 4
START_DAY NUMBER(38,4) Yes 5
START_MON NUMBER(38,4) Yes 6
START_YEAR NUMBER(38,4) Yes 7
END_DAY NUMBER(38,4) Yes 8
END_MON NUMBER(38,4) Yes 9
END_YEAR NUMBER(38,4) Yes 10
SIDE_A_BOOL NUMBER(1,0) Yes 11
REVISIONIST_STATE_BOOL NUMBER(1,0) Yes 12
REVISION_TYPE2 NUMBER(38,4) Yes 13
REVISION_TYPE1 NUMBER(38,4) Yes 14
FATALITY_LEVEL NUMBER(38,4) Yes 15
FATALITY_PRECISE NUMBER(38,4) Yes 16
HIGHEST_ACTION NUMBER(38,4) Yes 17
HOSTILITY_LEVEL NUMBER(38,4) Yes 18
ORIGINATOR_BOOL NUMBER(1,0) Yes 19

TABLE “RELIGIONS_BY_NATION”

STATE VARCHAR2(4000 BYTE) Yes 1
NAME VARCHAR2(4000 BYTE) Yes 2
DATATYPE VARCHAR2(4000 BYTE) Yes 3
SOURCERELIAB VARCHAR2(4000 BYTE) Yes 4
RECRELIAB VARCHAR2(4000 BYTE) Yes 5
RELIABILEVEL VARCHAR2(4000 BYTE) Yes 6
VERSION VARCHAR2(4000 BYTE) Yes 7
SOURCECODE VARCHAR2(4000 BYTE) Yes 8
YEAR NUMBER(38,4) Yes 9
CHRSTPROT NUMBER(38,4) Yes 10
CHRSTCAT NUMBER(38,4) Yes 11
CHRSTORTH NUMBER(38,4) Yes 12
CHRSTANG NUMBER(38,4) Yes 13
CHRSTOTHR NUMBER(38,4) Yes 14
CHRSTGEN NUMBER(38,4) Yes 15
JUDORTH NUMBER(38,4) Yes 16
JDCONS NUMBER(38,4) Yes 17
JUDREF NUMBER(38,4) Yes 18
JUDOTHR NUMBER(38,4) Yes 19
JUDGEN NUMBER(38,4) Yes 20
ISLMSUN NUMBER(38,4) Yes 21
ISLMSHI NUMBER(38,4) Yes 22
ISLMIBD NUMBER(38,4) Yes 23
ISLMNAT NUMBER(38,4) Yes 24
ISLMALW NUMBER(38,4) Yes 25
ISLMAHM NUMBER(38,4) Yes 26
ISLMOTHR NUMBER(38,4) Yes 27
ISLMGEN NUMBER(38,4) Yes 28
BUDMAH NUMBER(38,4) Yes 29
BUDTHR NUMBER(38,4) Yes 30
BUDOTHR NUMBER(38,4) Yes 31
BUDGEN NUMBER(38,4) Yes 32
ZOROGEN NUMBER(38,4) Yes 33
HINDGEN NUMBER(38,4) Yes 34
SIKHGEN NUMBER(38,4) Yes 35
SHNTGEN NUMBER(38,4) Yes 36
BAHGEN NUMBER(38,4) Yes 37
TAOGEN NUMBER(38,4) Yes 38
JAINGEN NUMBER(38,4) Yes 39
CONFGEN NUMBER(38,4) Yes 40
SYNCGEN NUMBER(38,4) Yes 41
ANMGEN NUMBER(38,4) Yes 42
NONRELIG NUMBER(38,4) Yes 43
OTHRGEN NUMBER(38,4) Yes 44
SUMRELIG NUMBER(38,4) Yes 45
POP NUMBER(38,4) Yes 46
CHRSTPROTPCT NUMBER(38,4) Yes 47
CHRSTCATPCT NUMBER(38,4) Yes 48
CHRSTORTHPCT NUMBER(38,4) Yes 49
CHRSTANGPCT NUMBER(38,4) Yes 50
CHRSTOTHRPCT NUMBER(38,4) Yes 51
CHRSTGENPCT NUMBER(38,4) Yes 52
JUDORTHPCT NUMBER(38,4) Yes 53
JUDCONSPCT NUMBER(38,4) Yes 54
JUDREFPCT NUMBER(38,4) Yes 55
JUDOTHRPCT NUMBER(38,4) Yes 56
JUDGENPCT NUMBER(38,4) Yes 57
ISLMSUNPCT NUMBER(38,4) Yes 58
ISLMSHIPCT NUMBER(38,4) Yes 59
ISLMIBDPCT NUMBER(38,4) Yes 60
ISLMNATPCT NUMBER(38,4) Yes 61
ISLMALWPCT NUMBER(38,4) Yes 62
ISLMAHMPCT NUMBER(38,4) Yes 63
ISLMOTHRPCT NUMBER(38,4) Yes 64
ISLMGENPCT NUMBER(38,4) Yes 65
BUDMAHPCT NUMBER(38,4) Yes 66
BUDTHRPCT NUMBER(38,4) Yes 67
BUDOTHRPCT NUMBER(38,4) Yes 68
BUDGENPCT NUMBER(38,4) Yes 69
ZOROGENPCT NUMBER(38,4) Yes 70
HINDGENPCT NUMBER(38,4) Yes 71
SIKHGENPCT NUMBER(38,4) Yes 72
SHNTGENPCT NUMBER(38,4) Yes 73
BAHGENPCT NUMBER(38,4) Yes 74
TAOGENPCT NUMBER(38,4) Yes 75
JAINGENPCT NUMBER(38,4) Yes 76
CONFGENPCT NUMBER(38,4) Yes 77
SYNCGENPCT NUMBER(38,4) Yes 78
ANMGENPCT NUMBER(38,4) Yes 79
NONRELIGPCT NUMBER(38,4) Yes 80
OTHRGENPCT NUMBER(38,4) Yes 81
SUMRELIGPCT NUMBER(38,4) Yes 82
TOTAL NUMBER(38,4) Yes 83
DUALRELIG NUMBER(38,4) Yes 84

05. Visualizations

Quan’s Stories

A. Start with a green thing in non-aggregated mode and make a boxplot

For Religions by Nation dataset:

  • Put Name into Columns
  • Put Budgen into Rows
  • Choose Boxplot in Show Me

  • Make a new barchart: put Name into Columns, Budgen into Rows
  • Pick out interesting countries (filtering by Name as CHN, DRV, JPN, MYA, THI)
  • Filter by Year greater than or equal to 1955 (to eliminate null values)
  • Put Year in Pages to create a timeline slider
  • Put Name in Marks as Color for labeling

Interesting Findings:

  • China vs Japan: Even though China and Japan has a striking difference in total population, for the early period (1945 – 1970), their Buddhist population was comparable (with Japan’s being little higher). However, from 1970 onwards, Buddhist population in China increased markedly and steadily, outstripping Japan completely.

  • The interesting point about China’s increase is that even though it was rapid, there was a sudden drop (by no less than 50%) between 1995 and 2000, followed by a sudden comeback and plateau right in 2005.

  • For Vietnam, after long period of slow but steady increase in Buddhist population between 1955-1995, there appears a sharp increase and plateau in 2000 (from 6.5M to 40M between 1995 and 2000).

B. Start with two green things and a map

For Religions by Nation dataset:

  • Choose Name’s Geographic Role as Country/Region, then Add to Sheet
  • Filter by Nonreligpct to be at least 15% (eliminating insignificant data)
  • Put Year in Pages to create a timeline slider
  • For Marks, put Name into Color as labeling, and Nonreligpct into Label for showing values

  • Create a new scatterplot: put Year into Columns and Nonreligpct into Rows
  • Put Name into Columns for different facets
  • Pick out interesting countries (filtering by Name as CHN, CUB, CZE, FIN, JAM, PRK, RUS, USA)
  • Put Name into Color in Marks for labeling

Interesting Findings:

  • The behavior of nonreligious percentage in Czechoslovakia, Finland, and USA are considered normal. Thus, they are showned just for comparison.

  • The interesting behaviors are in countries that have an abrupt change in trend, or a sudden increase/decrease in nonreligious percentage of at least 10%.

  • Russia: sudden drop from 63% in 1995 to 12% in 2000.

  • North Korea: 79% in 1995 to 64% in 2000.

  • Jamaica: downward slope from 1960-1980, but a sharp turn upwards to 1990 and then gradual decrease.

  • Cuba: sharp increase from 0% in 1965 to 25% in 1970.

  • China: sharp increase in period 1945-1960.

C. Start with a green thing in aggregated mode and make a histogram

For Militarized Disputes dataset:

  • Create a calculated field named Dispute Length = [End Year] - [Start Year] + 1
  • Choose Histogram in Show Me
  • In Dimensions, choose to edit Dispute Length (bin) and change bin size to 1

Comment: Result shows highest count of dispute length to be for 1 year, and the count decreases significantly as dispute length increases, which is expected and not interesting.

Denominational Dispersions

1995

2000

  • Where’d all the ‘other’s go? It would appear from this graph that between 1995 and 2000 an astonishing number of ’other’ (unafilliated or less-represented denominations) Muslims became either Sunni or Shiite – and mostly Sunni, by a large margin.

  • I decided to delve further, and look at each country individually. There, a pattern emerged, as you can see below:

  • This is not a universal pattern, but you’ll see that more than 40 countries show a near-total reorganization of their Muslim population into Sunni/Shiite between 1995 and 2000.

  • And if that’s not odd enough, 6 Buddhist countries show the same reorganization:

  • Perhaps this was a regional shift? Let’s check:

  • Nope! There’s a moderate geographic correlation with the Buddhist shift, but with two major outliers (33% of the shift), and there’s no regional correlation at all with the Islamic shift.

  • I found myself trawling Wikipedia and world news articles for major events in the Muslim and Buddhist worlds, but that quickly delved to the deepest reaches of conjecture. All we can say for sure from this data is: something massively significant happened in these two religious communities in the last half of the 90’s.

War and those who flee it

Started with the Military Disputes dataset.

Wanted to take a look at the military disputes data. A couple key terms…

Made a crosstab to look at the different disputes.
- To get country names, blended data with a dataset that connected country abbreviations to their full name.

The military dispute dataset.

Okay, so can we correlate this data to the refugee movement out of these countries to see if we find any trends?

-The Refugee dataset.

Looking for trends… Method:
- Inner join on our refugee data and the military dispute data.
- Cross tab, using the KPI of percentage of a country’s population that moved out of country as refugees in the same year as a dispute.

  • Unfortunatly, there doesn’t seem to be a great many intersting trends. The data is terribly convaluted… What if we look by country, instead of by dispute number?

  • So still not many trends, but the data China is interesting. A novice mistake would be to say that these conflicts created over a million refugees. BUT, we must remember that this is refugees in the same year as the conflict, not due to the conflict itself!

  • As it turns out, some of these conflicts that are rated as “Wars”, actually only lasted one day! But, its still interesting that China had over a million refugees that year…

So lets look at China, decade by decade…

Method:
- Filter by years and State name.
- Include a term to track the number of conflicts through the decade.

1960-1969

  • Look at all those refugees! So what is happening? As it turns out, these years correlate exactly to the Cultural Revolution. It seems likely that these “refugees” are actually just displaced populuce combined with a flight out of an increasingly restrictive regeim.

1970-1979

  • A very quiet decade. I suspect that a problem with the joins caused so few disputes to show up. Still a decent ammount of refugees, but nothing compared to what we saw earlier.

1980-1989

  • In the 1980’s we see an increasing number of conflicts, but almost no refugees whatsoever! We also notice that a great many of the conflicts are “Revisionist” conflicts: basically small civil “wars” or revolutions.

1990-1991

  • Now we see a huge change in 1993! A huge increase in the number of refugees to nearly one-hundred thousand per year. The conflicts weren’t particularly deadly, and only about half of them were revisionist. This trend continues until 2008, the end of our data…

2000-2008

  • The trend starting in the 1990’s continues, and we reach the end of our data.

=======

06. Shiny App for Interesting Visualizations

https://renaissance8905.shinyapps.io/War_And_Pear